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Streptococcus thermophilus ASCC 1275 (ST 1275), a typical dairy starter bacterium, yields the highest 
known amount (—1,000 mg/L) of exopolysaccharide (EPS) in milk among the species of S. thermophilus. 
The addition of this starter in milk fermentation exhibited texture modifying properties for fermented dairy 
foods such as yogurt and cheese in the presence of EPS as its important metabolite. In this genomic study, a 
novel eps gene cluster for EPS assembly of repeating unit has been reported. It contains two-pair epsC-epsD 
genes which are assigned to determine the chain length of EPS. This also suggests this organism produces 
two types of EPSs - capsular and ropy EPS, as observed in our previous studies. Additionally, ST 1275 
appears to exhibit effective proteolysis system and sophisticated stress response systems to stressful 
conditions, and has the highest number of four separate CRISPR/Cas loci. These features may be conducive 
to milk adaptation of this starter and against undesirable bacteriophage infections which leads to failure of 
milk fermentation. Insights into the genome of ST 1275 suggest that this strain may be a model high 
EPS-producing dairy starter. 



Conventional dairy starter bacteria including Streptococcus thermophilus, Lactobacillus delbrueckii subsp. 
bulgaricus and Lactococcus lactis have a long history of use in the home-made and modern manufacture of 
fermented dairy foods, i.e., yogurt and cheese 1,2 . These dairy starters are able to ferment milk lactose to 
produce lactic acid which decreases the pH to 4.5 ~ 4.7 resulting in the coagulation of milk proteins 3,4 . Among 
these important conventional starters, S. thermophilus is a non-pathogenic and homofermentative facultative 
anaerobe, which is used for the manufacture of yogurt and certain types of cheese. There has been an increasing 
interest in using a novel EPS-producing S. thermophilus for enhancing functionalities of yogurt and cheeses 5-9 . 

Until April 2014, six strains of S. thermophilus have been fully sequenced and their whole-genome sequence 
data are released in the NCBI Genome database 10-14 . Comparative genome analysis of dairy S. thermophilus 
suggests that their proteolytic activity, nitrogen metabolism, sugar utilization and transporter systems play crucial 
roles for their adaptation to milk environments 7,1215 . In addition to the "generally recognized as safe" status of 
dairy S. thermophilus through loss-of function events such as decay and loss of virulence determinants during 
evolution, both lateral gene transfer (LGT) and natural competence contribute to the shaping of S. thermophilus 
genome. This kind of evolution results in diverse metabolic activities and gives new functionalities to dairy 
foods 10,16 . Common features of dairy S. thermophilus include rapid acidification of milk, acid tolerance, bacter- 
iocin synthesis, lactose utilization, production of formic and folic acids, innate and adaptive immunity, bacterio- 
phage resistance, and most importantly, exopolysaccharide (EPS) biosynthesis 7,15 . These features are important 
for dairy S. thermophilus as starter bacterium for its applications in milk fermentation. 

Extracellular polysaccharide, also known as exopolysaccharide (EPS), produced by lactic acid bacteria (LAB) 
including S. thermophilus is generally regarded as a food-grade as it is naturally produced 5,8,9 . EPS may be secreted 
into the medium as ropy EPS, or may be attached to cell surface of the microorganism in the form of capsular 
EPS 8 . EPS has been reported to improve the viscosity and texture of yogurt and some cheeses, and to prevent 
syneresis in yogurt 5,8,17-21 . Moreover, EPS produced by dairy LAB is able to replace chemically modified starches 
or milk fat in commercial yogurt, especially set-type yogurt, to give considerable rheological effects, mouthfeel, 
and creaminess to fermented milk products 5,20,21 . Certain EPSs have also been reported to have some important 
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probiotic characteristics such as immunostimulative properties, anti- 
oxidative effects, and anti-microbial activities against pathogens 22 24 . 

In general, EPS yield among majority of S. thermophilus strains 
varies from 20 mg/L to 600 mg/L in milk-based medium under 
optimal conditions 9,25 . Among all the reported data of EPS yield from 
the species of S. thermophilus, S. thermophilus ASCC 1275 (ST 1275) 
produced the highest known amount of EPS (—1,029 mg/L) in milk 
medium in presence of 0.5% whey protein concentrate when fer- 
mentation was carried out at pH 5.5 and 37°C for 24 h 26 . 
Moreover, ST 1275 produced both capsular and ropy EPS 20-27 . It 
has been documented that capsular EPS does not cause ropiness in 
milk products whereas ropy EPS contributes to the enhanced texture 
of milk products 28 . Our previous studies have shown that high 
amount of EPS produced from ST 1275 exhibited texture modifying 
properties in Mozzarella cheese and yogurt 17 21 . Additionally, the 
usage of ST 1275 for milk fermentation contributed to the develop- 
ment of low-fat or fat-free yogurt and Mozzarella cheese 17,18,20 ' 29 " 32 . 
Thus, any efforts to increase EPS yield in milk would be of great 
significance for enhancing functionalities of fermented dairy foods. 

EPS assembly of repeating unit is determined by eps gene cluster, 
which has been revealed in detail in certain species of LAB and has 
shown diverse gene structures so far 33,34 . Despite the release of eps 
gene clusters from six sequenced strains of S. thermophilus 10 ' 14 , their 
data on EPS yield still remains unknown; this may be due to the 
commercial nature of these strains or low yield of EPS. Hence, our 
understanding of high EPS-producing S. thermophilus at genomic 
level is still limited. Based on our previous studies on high EPS yield 
from ST 1275 in milk, we used ST 1275 in the current study as a 
model dairy starter to demonstrate the mechanism of high EPS yield 
from the species of S. thermophilus at genomic level. 

Results 

Genome sequencing and assembly. ST 1275 genome was sequenced 
by one shotgun run and one 8 kb-span paired-end run using a 454 
Roche GS Junior System. A total of 72,487,271 bases generated from 
158,162 raw shotgun reads and 56,596,072 bases from 152,819 raw 
paired-end reads were aligned into 65 contigs and 4 scaffolds, re- 
sulting in an average sequencing depth of —62 fold. Draft genome 
was achieved by de novo assembly to produce a draft genome with 4 
scaffolds containing 44 large contigs with an N50 Contig length of 
100,486 bp long, indicating that this assembly was highly conti- 
nuous. Only three gaps were found between the junctions of 
contigs, and were filled in by general PCR and Sanger sequencing 
method. This de novo shotgun paired-end pyrosequencing is able to 
provide high sequencing depth for microbial genome. 

General features of ST 1275. The complete circular genome of ST 
1275, which was a plasmid-free bacterium, was 1,845,495 bp with an 
average GC content of 39.06% (Fig. 1). A comparison of general 
features of five sequenced S. thermophilus strains and ST 1275 
genome is shown in Table 1. As compared with other sequenced S. 
thermophilus, ST 1275 possessed the lowest numbers of 5 and 55 of 
rRNA operon and tRNA, respectively. Moreover, the highest number 
of four separate CRISPR/Cas loci was found in its genome suggesting 
that this organism may have better adaptive immunity against 
various bacteriophage infections. 

The result of functional annotations of ST 1275 and other five 
sequenced S. thermophilus is shown in Fig. 2. In general, no major 
differences were found in regards to the number of genes in each 
functional group. Three highest numbers of genes in these six strains 
were found in the functional groups including those associated with 
protein, and amino acids and with carbohydrate metabolism. This 
indicates that above three functional groups are closely associated 
with adaptation of S. thermophilus to milk environment in regards to 
nutrients such as milk proteins and lactose. 



Carbohydrate utilization and sugar transport system. Sugar up- 
take, transport system and sugar hydrolases in ST 1275 are shown in 
Supplementary Table 1 . Partial sugar metabolism involved in nucleo- 
tide sugar biosynthesis is shown in Fig. 3. Our previous studies have 
demonstrated that this organism was able to metabolize lactose into 
lactic acid efficiently resulting in rapid acidification of milk (pH 4.5- 
4.7) within 8 h during milk fermentation 26 . This is the pH at which 
coagulation of milk takes place, and importantly this is an acceptable 
fermentation period for industrial processing. In addition to utilizing 
lactose, galactose and glucose, ST 1275 appears to be able to ferment 
mannose and fructose (Supplementary Table 1 and Fig. 3). However, 
sucrose, mannose and fructose are the only three sugars that may be 
transported by specific phosphoenolpyruvate-dependent phospho- 
transferase systems (PEP-PTS), while lactose- and glucose-specific 
PEP-PTS is not available in ST 1275. Since lactose is the main sugar in 
milk, rapid acidification of milk by this starter is highly dependent on 
the utilization of lactose during milk fermentation. 

Unlike limited number of hydrolases for amylose in other sequ- 
enced S. thermophilus strains, intact genes including one tx-amylase, 
one glucanhydrolase, three glycogen debranching proteins and two 
alkaline amylopullulanases were found in ST 1275 genome (Supple- 
mentary Table 1). This suggests that this organism may have an 
efficient amylolytic activity to break down starch 35 . This may be 
important for performing fermentation for achieving high cell- 
density using amylose as a cheap source of carbohydrate. 

EPS biosynthesis and comparison of eps gene cluster. All essential 
components for EPS production including complete nucleotide su- 
gar biosynthesis (Fig. 3) and a novel eps gene cluster for EPS assembly 
(Fig. 4) were found in ST 1275 genome. This starter contains highly 
conserved epsA-epsB which was assigned for biosynthesis regulation 
and epslC-epslD for determining the chain length of EPS 12,36 . epsE 
gene encodes a membrane-associated priming glycosyltransferase, 
and does not catalyze glycosidic linkage but transfers sugar- 1- 
phosphate to undecaprenyl-phosphate-lipid carrier on the cytoplas- 
mic face of the membrane 34,37 . Subsequently, epsV, epsG, epsli, epsl, 
eps] and epsK encoding glycosyltransferases may transfer various 
nucleotide sugars including UDP-glucose, UDP-galactose, dTDP- 
rhamnose, UDP-GlcNAc and UDP-galactofuranose to form the 
repeating units in a glycosidic linkage-dependent manner 34,37 . Addi- 
tionally, a unique UDP-galactopyranose mutase was found in this 
cluster for the synthesis of UDP-galactofuranose. However, chemical 
structure and sugar composition of repeating unit remain to be 
determined. Remarkably, it was for the first time that we found an 
additional epslC-epsZD in this cluster, which may also be involved in 
the chain length determination. The assigned functions of polymeri- 
zation and translocation of repeating units are achieved by epsL and 
epsN, respectively. The epsO and epsP together are possibly responsi- 
ble for the phosphorylation events, while epsQ is assigned for the 
transfer of EPS between the membrane and peptidoglycan layer. It 
has been documented that the or/14.9 gene distributed in all eps gene 
clusters of six strains (Fig. 4) is associated with the cell growth of S. 
thermophilus 3 ". 

In general, nucleotide sugar biosynthesis is one of the two factors 
for EPS yield while the eps gene cluster is another key factor for EPS 
assembly of repeating unit in lactic acid bacteria (LAB). However, 
various structure of eps gene cluster has been shown in LAB indi- 
cating that the production and chemical structure of EPS is strain- 
specific 33,34 . Interestingly, the occurrence of two-pair genes, namely 
epslC-epslD and eps2C-eps2D, for determining the chain length of 
EPS in ST 1275 genome implies that this starter may produce EPSs of 
different molecular sizes. This confirms our previous finding that ST 
1275 is a producer of both capsular and ropy EPS 20,27 . 

Proteolytic system. Milk is known to be a poor source of carbon 
and free amino acids, but contains abundance of proteins such as 
casein. It was found that extracellular proteinase (known as PrtS), 
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Figure 1 | Circular genome map of the S. thermophilus ASCC 1275 chromosome. The genome of plasmid-free ST 1275 is 1,845,495 with an average GC 
content of 39.1%. The circular genome has been generated with the CGView Server 59 . The GeneBank accession number for ST 1275 genome is CP006819. 



membrane transporters and intracellular peptidases contribute to 
the utilization of exogenous proteins by S. thermophilus in milk 7,12 . 
Hence, proteolysis system in ST 1275 plays a crucial role for this 
organism for its adaptation to milk. For extracellular proteinase, 
ST 1275 encodes one intact PrtS (T303_05205), which is involved 
in the cleavage of casein to oligo-peptides and is only found in some 
strains of S. thermophilus. This is a key component for cell growth in 
milk 39 41 . Then, oligo-peptides and free amino acids are transported 
into cells by membrane amino acid/peptide transporters. Remark- 
ably, an abundance of intracellular protease and peptidase were 
found in ST 1275 (Supplementary Table 2). This helps ST 1275 
cells break down oligo-peptides into free amino acids for cellular 
metabolism or for direct utilization. 

Two-component regulatory systems. The two-component regula- 
tory systems (TCRSs) and related loci are shown in Supplementary 
Table 3. It has been documented that TCRSs are closely associated 
with stress and adaptive responses, bacteriocin biosynthesis, natural 
competence and biofilm formation 42,43 . Seven intact TCRSs were 
found in ST 1275 (Supplementary Table 3). However, certain 



functions of TCRS have been poorly characterized in S. thermo- 
philus and most of them have unknown functions or are involved 
in multiple cellular responses 7 . 

Stress response systems. Acid resistance, cold and heat response, salt 
resistance, and oxidative stress response system for ST 1275 are 
shown in Supplementary Table 4. These loci presented in ST 1275 
genome may play important roles for ST 1275 in adapting cells to 
stressful conditions, such as presence of oxygen, heat and cold, acid 
and salt. In addition to the TCRSs in ST 1275, additional stress 
regulators (T303_00880 and T303_09015) may be involved in the 
regulation of adaptive cellular responses. 

Similar to other sequenced S. thermophilus strains, ST 1275 con- 
tains almost same number or types of heat-shock and cold-shock 
proteins, and oxidative stress response-related genes for bacterial 
fitness or performance. 

For acid resistance, a proton translocaing FoF^ATPase system and 
a urease system coupled with ammonia permease were found in ST 
1275 genome (Supplementary Table 4). These may contribute to 
internal pH homeostasis in this starter when facing extreme acidic 
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Table 1 | Comparison of general genome features of sequenced S. thermophilus 

S. thermophilus 



Feature 


ASCC 1 275 


LMD-9 


CNRZ1066 


LMG 1831 1 


ND03 


MN-ZLW-002 


Origin ot strain 


ASCRC 


Danisco 


Yogurt isolate 


Yogurt isolate 


Yili Group 


Mengniu Group 




(Australia) 


(USA) 


(France) 


(UK) 


(China) 


(China) 


Size of chromosome (bp) 


1 ,845,495 


1 ,856,368 


1,796,226 


1,796,846 


1,831,949 


1,848,520 


No. of Plasmid 


0 


2 


0 


0 


0 


0 


G + C content (%) 


39.1 


39.1 


39.1 


39.1 


39.0 


39.1 


No. of ORFs (by GLIMMER 


2,253 


2258 


2,191 


221 1 


2248 


2258 


v3.02) 














No. of Genes 


1,959 


2,004 


1,999 


1,973 


2,038 


2,046 


No. of CDS 


1,694 


1,71 1 


1,914 


1,888 


1,919 


1,910 


Coding density (%) 


77.85 


79.01 


90.69 


88.69 


88.08 


87.29 


No. of rRNA operons 


5 


6 


6 


6 


5 


5 


No. of tRNAs 


55 


67 


67 


67 


56 


56 


No. of CRISPR/Cas locus (by 


4 


3 


1 


2 


3 


3 


CRISPR finder) 














GeneBank accession 


CP006819 


CP000419.1 


CP000024.1 


CP000023.1 


CP002340.1 


CP003499.1 



environment, such as acids produced during milk fermentation. 
However, no loci encoding intact amino acid deiminase and decar- 
boxylase were found in ST 1275 genome; those are also associated 
with maintenance of internal pH in bacteria 44,45 . Remarkably, urease 
system is only found in S. thermophilus among all the species of LAB, 
and has been found to be effective for the control of internal pH 
homeostasis 46 . 

Interestingly, several salt resistance-related genes were found in ST 
1275 genome. Since S. thermophilus is an essential starter for the 
manufacture of several common types of cheeses, these genes may 
help ST 1275 cells survive or adapt to high level of salt, especially in 
cheeses containing high level of salt. 

Defense system. The loci encoding bacteriocin biosynthesis, 
multidrug resistance genes and competence proteins for natural 
transformation are shown in Supplementary Table 5. Lantibiotic is 
commonly produced by S. thermophilus as an anti-microbial weapon 



against other microbes such as food-borne pathogens 47 . Additio- 
nally, several early and late competence genes were found in ST 
1275 genome. Interestingly, it has been demonstrated that Ami 
(oligopeptide transporter), signal peptide and comX (sigma factor) 
are important for the induction of early competence development in 
S. thermophilus 4 "' 50 . Natural competence is closely associated with 
LGT such as acquisition of novel genes in S. thermophilus 2,16 . 

Moreover, several genes for multidrug resistance (Supplementary 
Table 5) including two (3-lactamases were found in its genome. 
However, genes encoding above enzyme for hydrolyze fi-lactam anti- 
biotics is very common in LAB and recognized probiotics such as 
Lactobacillus rhamnosus GG 51 . The gene of P-lactamase may be 
obtained via LGT during its evolution when pTactam antibiotics 
were common and largely used in 20th century. Other multidrug 
ABC transporter system may be useful for removing cytotoxic com- 
pounds. Additionally, a mucus-binding protein (T303_03820) was 
found in ST 1275 genome, which indicates that this organism may 




■ LMD-9 



■ MN-ZLW-002 



Figure 2 | Comparison of functional annotation of ST 1275 and other five sequenced S. thermophilus using RAST server. The nucleotide sequences of 
six sequenced S. thermophilus were uploaded into the RAST server based on SEED subsystems for functional annotations. 
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UDP- 
galactofuranose 



20 
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(repeating unit) 



I 



Polymerization & Secretion 



Exopolysaccharides 

Figure 3 | Nucleotide sugars biosynthesis for EPS production in S. thermophilus ASCC 1275. The numbers refer to the enzymes involved: 1, (3- 
Galactosidase; 2, Glucokinase; 3, Phosphoglucomutase; 4, UDP-glucose pyrophosphorylase; 5, UDP-glucose 4 epimerase; 6, UDP-galactose 4 epimerase; 
7, Galactose 1-phosphate uridylytransferase; 8, Galactose mutarotase; 9, Galactokinase; 10, dTDP-glucose pyrophosphorylase; 11, dTDP-glucose-4, 6- 
dehydratase; 12, dTDP-4 keto-6 deoxy-glucose 3, 5-epimerase; 13, dTDP-4 keto-L-rhamnose reductase; 14, Fructokinase; 15, 6-phosphofructokinase; 16, 
Phosphoglucose isomerase; 17, glutamine-fructose-6-phosphhate transaminase; 18, Phosphoglucosamine mutase; 19 & 20, N-acetylglucosamine-1- 
phosphate uridyltransferase (bifunctional); 21, UDP-galactopyranose mutase. 

have potential as a probiotic organism to colonize and survive in the CRISPR finder online service (Fig. 5). Recently, CRISPR/Cas sys- 
human gastrointestinal tract, especially in inviduals exposed to p 1 - tem as prokaryotic defense system against bacteriophage infections 
lactam antibiotics. has been documented. There have been several mechanisms against 

bacteriophage infections in bacteria such as encounter blocks, resis- 
CRISPR/Cas system against bacteriophage infection. Four sepa- tance to viral absorption, penetration blocks, restriction modification 
rate CRISPR/Cas loci were predicated in the genome ST 1275 by and CRISPR/Cas system 52 . However, CRISPR/Cas system has been 
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Figure 4 | Comparison of eps gene cluster among S. thermophilus ASCC 1275 and other five sequenced S. thermophilus. The predicated functions of 
each color-coded ORF (intact or truncated) are indicated at the lower bottom panel. The size of each ORF in eps gene cluster is indicated in each 
pentagon (intact) or chevron (truncated). 
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CRISPR/Cas locus 1 (position: 817,355 bp ~ 825392 bp) 

Csnl Casl Cas2 Csn2 CR1SPR 1 (32 spacers; 21-16 bp) 

DR consensus (36 bp): GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC 



CRISPR/Cas locus 2 (position: 1,073,432 bp ~ 1,082,060 bp) 

Casl Cas2 CR1SPR 2 (3 spacers; 25S bp) Cas6 Csml Csm2 Cmr4 CsmJ Csm5 Csm6 
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■ ■ ■ ■ 

i 1 1 r 
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900 bp 
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CasC CasB 
594 bp 



1667 bp 



DR consensus (28 bp): GGATCACCCCCGCGTGTGCGGGAAAAAC 



CRISPR/Cas locus 4 (position: l,558,841bp - 1,566,015 bp) 

CRISPR 4 (12 spacers; S27 bp) Csn2 Cas2 Casl Csnl 

| | bP ~^ 345bt " < \^ 870b '' 

DR consensus (36 bp): GTTTTGGAACCATTCGAAACAACACAGCTCTAAAAC 

Figure 5 | Structure of CRISPR/Cas loci in S. thermophilus ASCC 1275. DR, direct repeat. Four different DRs were black color-coded and spacers were 
other color-coded. The consensus sequence and size of four DRs are indicated at lower right panel of each locus. The size of each CRISPR-associated 
protein in the locus is indicated in each pentagon (intact) or chevron (truncated). 




4167 bp 



widely distributed in prokaryotes as an adaptive immunity against 
bacteriophage infection. In addition to their innate immunity such 
as restriction modification system in dairy starters, adaptive 
immunity is very important for both dairy and starter culture 
industries to guard against phage infection which causes failure of 
milk fermentation 52,53 . 

Interestingly, ST 1275 contains the highest numbers of CRISPR/ 
Cas loci, possessing four CRISPR loci and 24 CRISPR-associated 
protein (cas) genes including two truncated cas genes, among all 
the sequenced strains of S. thermophilus. In general, three CRISPR 
loci are located at the downstream of cas genes while CRISPR2 is 
located in the middle of cas genes in CRISPR/cas locus 2. Moreover, 
four CRISPR loci have three different spacer numbers and four dif- 
ferent consensus sequences of direct repeats (DRs). These diverse 
CRISPR/Cas loci in ST 1275 suggest that it may have a better adaptive 
immunity against different bacteriophages compared with those in 
other sequenced S. thermophilus. This is important for industrial 
manufacturing of dairy products that use this organism. In particu- 
lar, CRISPR1 locus has the highest numbers of DRs and spacers when 
compared with other three loci. This suggests a possible effective 
defense mechanism to integrate novel spacers in CRISPR1 when 
ST 1275 is exposed to bacteriophages 53 . It is likely that CRISPR2 
locus may have limited contribution to bacteriophage response 
because of less spacers. It has been demonstrated that increased 
expression of casl and casl gene was indicative of higher activity 
in S. thermophilus LMD-9 during bacteriophage response 12 . Thus, 
the distribution of casl or casl gene in four CRISPR/Cas loci may 
confer their active roles in defense system. 

Discussion 

Due to the importance of EPS produced by dairy starter bacterium on 
the quality of fermented dairy foods, attentions have been paid to 
novel EPS-producing starters, especially an essential starter Strepto- 
coccus thermophilus 1,6,73 '. Although several S. thermophilus genomes 
are available, their EPS yields are not reported, possibly due to their 
commercial nature or low EPS yield 1014 . So far, numerous studies 
have been carried out for identification and characterization of eps 
gene clusters in high EPS-producing LAB while no genomic data is 



available for high EPS-producing starter bacterium, especially for an 
important organism such as dairy S. thermophilus 33,34 . To the best of 
our knowledge, ST 1275 produces highest known amount of EPS 
(~ 1,000 g/L) under optimal conditions in milk as compared with 
other well-documented EPS-producing S. thermophilus strains 
(Supplementary Table 6), however, its regulatory mechanism for 
EPS yield remains poorly understood and merits further studies 26 . 
Hence, it was interesting to have insight at genomic level of ST 1275 
as a model of high EPS-producing starter in the species of S. 
thermophilus. 

In general, S. thermophilus is not able to uptake lactose via lactose- 
PTS but via lactose/galactose permease (LacS). Then lactose is hydro- 
lyzed into glucose and galactose by P-D-galactosidase, and galactose 
is excreted into the extracellular medium by LacS resulting in high 
concentration of residual galactose in milk after fermentation 6 . 
Hence, less galactose is utilized by S. thermophilus as galactose is 
mainly metabolized for synthesis of nucleotide sugars for EPS pro- 
duction. However, the EPS yield in S. thermophilus strains is very 
limited. Residual galactose in cheeses such as Mozzarella cheese leads 
to browning during baking process of pizza made with such 
cheeses 54 . Thus, high EPS-producing S. thermophilus could be an 
ideal choice to reduce residual galactose in milk, and as well as for 
improving texture of dairy foods. 

Since eps gene cluster for EPS assembly of repeating unit and 
nucleotide sugar biosynthesis are the two factors that have direct 
influence on EPS yield. Hence, we have paid attention to both of 
them in ST 1275 genome. Interestingly, the occurrence of epslC- 
epslD and epslC-epslD (Fig. 4) assigned for chain length determina- 
tion indicates that this organism may assemble two types of EPSs of 
different molecular size. Based on our previous study that ST 1275 is 
a mixed producer of both capsular and ropy EPS 20,27 , we conclude 
that ST 1275 produces at least two types of EPSs. However, further 
work to determine the chemical structure of EPSs from ST 1275 
would be important. Additionally, previous studies have demon- 
strated that increased gene expressions involved in nucleotide sugar 
biosynthesis improved the EPS production from LAB including S. 
thermophilus 9,54 . However, very limited information is available for 
the gene expression in eps gene cluster for EPS assembly. Since it is 
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common that there is only one pair epsQ-epsD gene for chain length 
determination in LAB, the occurrence of two pair genes of epsC-epsD 
indicates a complex regulation of EPS production in ST 1275. In 
general, EPS production from LAB is cell growth-associated. 
However, our previous study found that optimization of cultivation 
conditions such as pH, temperature and addition of whey protein 
concentrate has resulted in a large increase in EPS yield, while no 
effect was observed on the cell growth of ST 1275 26 . This implies that 
gene expressions of nucleotide sugars biosynthesis or EPS assembly 
were possibly changed in ST 1275 under optimal conditions. Thus, 
mechanistic study on the regulation of EPS yield from ST 1275 merits 
further investigation. 

Comparisons of common features including carbohydrate utiliza- 
tion, proteolytic system, stress response system, and defense system 
among the sequenced S. thermophilus strains suggest that ST 1275 
may serve as a model for high EPS-producing dairy starter bac- 
terium. Specifically, this strain may possess effective proteolytic sys- 
tem, which contributes to adaptation of this organism to milk and 
rapid acidification of milk. Acid resistance using unique urease sys- 
tem in ST 1275 may improve cell viability in extreme acidic condi- 
tions such as in yogurt. Four dependent CRISPR/Cas loci may be 
effective in controlling phage infection. Abundance of multidrug 
resistance genes and a mucus-binding protein in the cell surface 
may allow ST 1275 to serve as a probiotic candidate for survival 
and colonization in the gut and for improving gut homeostasis. 
The elucidation of ST 1275 genome makes this organism a model 
dairy starter bacterium for high EPS yield among the species of S. 
thermophilus. 

Methods 

Bacterial strain and culture conditions. S. thermophilus ASCC 1275 (ST 1275), a 
typical dairy starter bacterium, was obtained from the Australian Starter Culture 
Research Center (ASCRC; now Dairy Innovation Australia Limited, Werribee, 
Victoria, Australia). This organism was stored at — 80°C in 10% (w/v) reconstituted 
skim milk containing 20% (v/v) glycerol and was activated by growing anaerobically 
in M17 agar (BD Company, NJ, USA) at 37 C for 24 h. After successful activation, a 
typical individual colony was inoculated in Ml 7 broth containing 1% lactose and 
anaerobically incubated at 37°C for 18 h. Then, cells were harvested for genomic 
DNA extraction. 

Genomic DNA extraction. Genomic DNA was extracted from ST 1275 using the 
CTAB/NaCl method according to the protocol from DOE Joint Genome Institute 
(JGI, http://my.jgi.doe.gov/general/protocols.html). Briefly, bacterial cultures were 
harvested by centrifugation, re-suspended in TE buffer containing lysozyme, SDS and 
Proteinase K, and incubated at 37°C for 1 h, followed by steps including addition of 
CTAB/NaCl (pre-warmed to 65°C), incubation at 65°C for 10 min, and DNA 
purification using phenol/chloroform/isopropanol (25/24/1, v/v/v). Genomic DNA 
was precipitated and washed by adding isopropanol and 70% ethanol, respectively. 
Finally, DNA pellet was dried and resuspended in TE buffer containing 0. 1 mg/mL of 
RNase. Then, the concentration and quality of genomic DNA were measured by 
Nanodrop-1000 UV/Vis spectrophotometer (NanoDrop Technologies, DE, USA). 

De novo shotgun paired-end pyrosequencing and genome assembly. Shotgun 
sequencing, paired-end pyrosequencing and Sanger sequencing were carried out to 
generate the whole genome of ST 1275 55 ' 56 . Briefly, shotgun sequencing was 
performed using 454 GS Junior System (Roche Diagnostics, CT, USA) using a GS FLX 
titanium rapid library preparation kit according to the manufacturer's instructions 
(Roche Diagnostics). One extra paired-end pyrosequencing run was carried out by 
using 8 kb-span library to produce a draft genome. The raw reads were de novo 
assembled into contigs using Newbler 2.7 (Roche Diagnostics). To complete the 
whole genome of ST 1275, primers were designed and gaps in the draft genome were 
filled by sequencing PCR products using ABI 3730 capillary sequencer. 

Gene prediction and annotation. Gene annotation was carried out using NCBI 
Prokaryotic Genome Annotation Pipeline 56 . Coding sequence (CDS) prediction 
programs provided by GLIMMER v3.02 was used for gene prediction 57 . BLASTp was 
used to align the amino acid sequences against NCBI non-redundant database. 
Amino acid sequences encoded by predicted genes were searched against all proteins 
from complete microbial genomes, alignment length over 90% of its own length and 
over 60% match identity were chosen, and the best BLAST hit with highest alignment 
length percentage and match identity was assigned as the annotation of predicated 
gene 55 . Further annotation was obtained using the SEED-based automated 
annotation system provided by the RAST server 58 . 



Bioinformatic analysis. CRISPR finder, a web online tool (http://crispr.u-psud.fr/ 
Server/), was used for identifying CRISPR/Cas systems in bacteria. Ortholog 
assignment and metabolic pathway mapping of ST 1275 was executed for the amino 
acid sequences of CDSs using KEGG Automatic Annotation Server (KAAS; http:// 
www.genome.jp/tools/kaas/), an online service based on bi-directional best hit (BBH) 
method. 
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